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This article discusses the problem of estimation of parameters in 
finite mixtures when the mixture components are assumed to be sym- 
metric and to come from the same location family. We refer to these 
mixtures as semi-parametric because no additional assumptions other 
than symmetry are made regarding the parametric form of the com- 
ponent distributions. Because the class of symmetric distributions is 
so broad, identifiability of parameters is a major issue in these mix- 
tures. We develop a notion of identifiability of finite mixture models, 
which we call k- identifiability, where k denotes the number of compo- 
nents in the mixture. We give sufficient conditions for fe-identifiability 
of location mixtures of symmetric components when = 2 or 3. We 
propose a novel distance-based method for estimating the (location 
and mixing) parameters from a ^-identifiable model and establish 
the strong consistency and asymptotic normality of the estimator. 
In the specific case of L2-distance, we show that our estimator gen- 
eralizes the Hodges-Lehmann estimator. We discuss the numerical 
implementation of these procedures, along with an empirical esti- 
mate of the component distribution, in the two-component case. In 
comparisons with maximum likelihood estimation assuming normal 
components, our method produces somewhat higher standard error 
estimates in the case where the components are truly normal, but 
dramatically outperforms the normal method when the components 
are heavy-tailed. 



1. Introduction. Given a random sample Xi, . . . ,Xn from a symmetric 
distribution, Hodges and Lehmann [8] proposed an estimator for the center 
of symmetry, fi, that consists of the median of aU n + (2) pairwise means 
{Xi + Xj)/2 for i < j. By the weh-known property that a sample median 
minimizes the sum of absolute deviations from all the points in the sample. 
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we may express this Hodges-Lehmann estimator as 



(1) 



AiHL = argmm 



EE 



Xi + Xj 



This article extends the idea of Hodges and Lehmann to a more general 
setting in which the sample Xi , . . . , X„ comes not from a symmetric distri- 
bution, but from a finite mixture of location-shifted symmetric distributions. 
Yet, we do far more in this article than generalize the Hodges-Lehmann es- 
timator. We propose a general method of estimation for location mixtures of 
symmetric components and discuss the central issue of identifiability, with- 
out which the very concept of estimation in these models is ill-defined. 

As motivating examples, consider the two samples depicted in Figure 1. In 
the Old Faithful dataset, measurements give time in minutes between erup- 
tions of the Old Faithful geyser in Yellowstone National Park, USA. These 
data are included as part of the R and S-PLUS statistics packages [type 
"help (faithful)" in R or "help (geyser)" in S-PLUS for more details]. For the 
Old Faithful eruption data, a two-component mixture model is clearly rea- 
sonable. However, in the case of the double exponential dataset, this fact 
is not so clear. These data were simulated from a 2-component mixture of 
location-shifted double exponential distributions. A common choice for fit- 
ting parameters in a 2-component mixture when nothing is known about 
the shape of the component distributions is to use maximum likelihood es- 
timation based on normally distributed components. For the Old Faithful 
data, the method we propose in this article performs nearly identically to the 
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Fig. 1. The Old Faithful dataset is clearly suggestive of a 2-component mixture of sym- 
metric components. The data on the right are simulated from a 2-component mixture of 
double exponential distributions with centers /ii = — 1 and p2 = 1 and mixing parameters 
Ai =0.3 and A2 = 0.7. 
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normal method; furthermore, our method enables a validation of the nor- 
mality assumption by providing a nonparametric estimate of the distribution 
function of the mixture components. For the simulated double exponential 
mixture on the right in Figure 1, estimates of ni = —1, fi2 = 1, and Ai = 0.3 
are (—1.04, 0.97, 0.33) for our method and (—5.48, 0.33, 0.006) for the normal 
method. In these examples, our method complements the likelihood-based 
normal methods when they appear to be appropriate and it outperforms the 
normal methods when they are not appropriate. We explore this comparison 
in Sections 5 and 6. 

But before we discuss the application of our method to data, there is much 
preparation to be done. As a first step, we formally specify the model for the 
data in Section 2. We also discuss the all-important topic of identifiability in 
Section 2. Parameter estimation is the topic of Section 3, where we propose 
a class of estimators and prove that they are strongly consistent. Section 4 
explores the connection between a particular form of our estimator and the 
Hodges-Lehmann estimator, exploiting this connection to establish asymp- 
totic normality. Section 5 examines the numerical implementation of our 
estimation method, giving several examples. Finally, the discussion in Sec- 
tion 6 compares our estimation method to the canonical estimation method 
for problems of this type, namely, maximum likelihood estimation assuming 
a mixture of normal distributions. Technical details about identifiability and 
proofs of strong consistency are given in Appendices A and B, respectively. 

2. The model and identifiability. Suppose that Xi, . . . ,X„ are indepen- 
dent and identically distributed from a fc-component mixture distribution 
with distribution function 



for some distribution function G{x) that is completely unspecified, except 
for the assumption that G is symmetric about zero, that is G{x) = 1 — G{—x) 
for all continuity points x of G. In this article, we denote by S the set of all 
distributions symmetric about zero and we refer to such distributions as zero- 
symmetric. The shifted distributions {G{x — fi) :G £ S, fj, £ R} are referred 
to as symmetric distributions. We assume throughout that k is fixed and 
known. Such an assumption is often justified on the basis of theory specific 
to the application at hand; see, for example, [7]. 

The distribution of equation (2) may be written as the convolution of 
G with a distribution supported on the k points (//i, . . . ,Hk)- Because such 
finite distributions arise frequently in this article, we introduce a notation 
for them. For A = (Ai, . . . , X^) and /x = (/.ti, . . . , fi^) such that Xj > for all 



k 



(2) 
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j and J2j '^j = 1) 

k 

j=i 

where 6t{x) = I{t < x} denotes the distribution function which assigns mass 1 
to the point t. We sometimes abuse notation and refer to the distribution 
Afc(A, /x) without its argument x. Thus, we may rewrite equation (2) as 

(3) F = G*Afc(A,/x). 

Throughout this article, we use the convention that a distribution function 
superscripted with a minus sign denotes the result of that distribution being 
reflected over the origin. For example, (A, fi) denotes the same distribu- 
tion as Afc(A, — /x). 

Compared to the large statistical literature regarding mixture models in 
which the component distributions are assumed to come from a known para- 
metric family, relatively little work has been done in the case where minimal 
assumptions are made regarding G. Hettmansperger and Thomas [7] report 
promising results when the component distributions are multivariate, by re- 
ducing the model to a mixture of multinomials using cutpoints. In their 
case, the data consist of vectors of repeated measures, assumed to be in- 
dependent and identically distributed, conditional on the component from 
which they are drawn. Hall and Zhou [6] discuss a related situation in which 
the repeated measures are independent but not identically distributed and 
the mixture has two components. Identifiability issues make both of these 
approaches impossible in the univariate (nonrepeated measures) case. 

Here, we take a qualitatively different approach. We consider univariate, 
rather than multivariate, data and we achieve identifiability by imposing a 
symmetry restriction on the individual components; see [3] for an alterna- 
tive approach to the same problem. Cruz-Medina and Hettmansperger [4] 
apply the outpoint approach of Hettmansperger and Thomas [7] to a simi- 
lar case in which the component distributions are unimodal and continuous 
(conditions we do not assume here) in addition to being symmetric. Ellis 
[5] considers the problem of deconvolving F = G -kQ into a symmetric part 
G and a nonsymmetric part Q, but without the assumption that Q is a 
fc-point distribution. Finally, Walther [14, 15] considers the problem not of 
estimation, but rather detection of the presence of mixing for a univariate 
distribution under very minimal assumptions on the component distribu- 
tions, although these assumptions are quite different from the assumptions 
we make here. 

The issue of identifiability looms large in the study of mixture models — 
in order for parameter estimation to make any sense, we must be assured 
that the parameters are uniquely determined by the mixture. Here, "the 



MIXTURES OF SYMMETRIC DISTRIBUTIONS 



5 



mixture" is F{x) from equation (3). Clearly, a permutation applied to the 
entries of A and /x does not change F{x), but this particular identifiability 
conundrum — often called the "label-switching" problem — is easily solved in 
this case by insisting that fii < • ■ • < /Ufc. Thus, we define the parameter space 
of interest to be 0,^ x S, where 



and S is the set of all zero-symmetric probability distributions on the real 
numbers. Furthermore, let 



be the set of all mixture distributions defined by equation (3). 

Typically, "identifiability" is a property of the whole set of distributions 
defined by a mixture model [10, 11, 13, 16]. Thus, to adopt the traditional 
view is to view identifiability of as an "all-or-nothing" proposition: ei- 
ther is identifiable, or it is not. (From this perspective, for A; > 1, it is 
not.) However, we prefer to define identifiability as a property of individ- 
ual distributions in A^^. This view makes it possible to refer to subsets of 
identifiable mixture distributions within Mk- 

To develop this notion of identifiability, let (fk-^k x 5 — > A^^ denote the 
function that maps (A,/x, G) onto G*Afc(A, /x). Essentially, identifiability 
means that ip^ should be a one-to-one (i.e., an invertible) function. Let 



denote the inverse image of F under (pk{-)- Although (^^ (F) is not always 
a singleton for F £ A4k, there are elements F E Mk for which ip'^^{F) is 
a singleton and these are precisely the /c-component mixtures we consider 
/c-identifiable. 

Definition 1. If F G Mk has the property that ip^^{F) contains a 
single element of Q-k x 5, then F is said to be identifiable as a k-component 
mixture of distributions from a symmetric location family. Alternatively, we 
say that such an F is k- identifiable. 

In estimation, the primary interest is typically in the values of A and 
H. Thus, we turn our attention to the largest subset 0^ C fifc such that 
the image of 0^ x 5 under the map ipk consists entirely of fe-identifiable 
distributions. 



(4) 0.1 = {(A, /i) G Ofc : G * Afc(A, ^Ji) is /c-identifiable for all G G S]. 




Mk = {F: F{x) = ^ X,G{x - /X,), (A, At) G Ofc, G G 5 



i 



^k\F) = {(^' G) G Ofc X 5 : ^k{\ /X, G) = F] 
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Note that fi^ is a proper subset of for k > 2 because no element 
of Ofc for which some Xj = can be in 0^. By the same reasoning, a k- 
component model can always be made into a (A; + £)-component model, 
i> 0, by adding i components with zero weight. Therefore, no distribution 
can be /c-identifiable for more than one value of k. 

Even if all Xj are nonzero, a distribution is not necessarily ^-identifiable. 
As an example, let Gi{t) = ^6-i{t) + ^Si{t) be the zero-symmetric distribu- 
tion with jumps of ^ at —1 and 1. We see that — 1) assigns equal mass to 
the points and 2, whereas Gi{t — 5) assigns equal mass to 4 and 6. Now, let 
G2{t) = ^(5-2 (t) + ^^2{t) be the zero-symmetric distribution with jumps of ^ 
at —2 and 2. This implies that G2(t — 2) assigns equal mass to the points 
and 4, whereas G2(i — 4) assigns equal mass to 2 and 6. We conclude that the 
mixtures lGi{t - 1) + ^Gi{t - 5) and ^G2{t - 2) + lG2{t - 4) both assigns 
mass J to the points 0, 2, 4 and 6. That is, we have expressed a particular dis- 
tribution as a 2-component mixture in two distinct ways, which means that 
F{t) = ^G2{t — 2) + ^G2(t — 4) is not 2-identifiable. Yet, even without decom- 
posing F{t) in two distinct ways, we can immediately see that it cannot be 
2-identifiable by noting that it is itself a symmetric distribution and therefore 

1- identifiable (recall that no distribution can be /c-identifiable for more than 
one A;). Thus, only asymmetric elements of A42 can be 2-identifiable. We 
prove in Theorem 2 that asymmetry is not only necessary but also suffi- 
cient. 

As expressed by equation (3), a location mixture may be written as a 
convolution. By exploiting the fact that convolution corresponds to multi- 
plication of characteristic functions, we prove in Appendix A the following 
simple characterization of the set Recall that A^(A, /x) denotes the re- 
flection of Afc(A,/x) about the origin. 

Theorem 1. For k>l, 0^ defined in equation (4) is the set of (A, fi) G 
rjfc such that A^(A, /x) is the unique k-point distribution that yields a zero- 
symmetric distribution when convolved with Afc(A,/i). 

It remains to describe 0^ explicitly for certain values of A;. It is not difficult 
to see that fij = Oi. For the case k = 2, O2 cannot contain any (A, /x) for 
which Ai = 0, Ai = I or Ai = 1 since those values make the mixture itself 
symmetric. But the symmetric mixtures in the case A; = 2 are the only ones 
that are not 2-identifiable, as Theorem 2 states. 

Theorem 2. = {(A,/^) G $^2 : Ai {0,^,1}}. Furthermore, every 

2- identifiable mixture F G Ai2 can be expressed as G*A2(A,/^) for some 
(A,/x) G Q2 "''^d GgS. 
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The second statement in Theorem 2 may appear trivial at first glance. Yet, 
it is not immediately clear for general k whether there exist (A', /i') G \ 
and G€S such that G-kAk{X',n') is /c-identifiable. Given an arbitrary 
G € S, the definition of fi^ merely states that (A, /i) G is a sufficient 
condition, not a necessary condition, that G-kAk{X,fj,) be /c-identifiable. 
Theorem 2 states that for k = 2, it is also a necessary condition. 

Unfortunately, the situation is not as straightforward when A; > 2 as it 
is when k = 2. For instance, the set fi^ does not contain all (A,/i) G i^k 
such that Afc(A,/x) is asymmetric and all Xj are nonzero. Furthermore, we 
do not know whether there exist ^-identifiable distributions F such that 
(f'j^^{F) is not in x S. Because it is somewhat complicated, we have put 
the statement of the explicit form of 0,'^ into Appendix A as Theorem A.l. 
Here, we offer only the following sufficient (but certainly not necessary) 
condition for membership in Og. 

Corollary 1. (A,/^) g if A1A2A3 / and {^12 - ^i)/(/^«3 - 
{3, 2, 1, 2, 3}. 

The stipulations in Corollary 1 that A1A2A3 / and (^2 — ^1)/ — /^2) / 
1 ensure that A^{X, fx) cannot itself be symmetric; the stipulation that the 
larger of fi2 — fJ-i and l-i-s — cannot be two or three times the smaller 
eliminates two troublesome special cases (the only two, it turns out) which 
are given in equations (A.2)-(A.5). 

Theorem 2 and Corollary 1 together imply that for A: < 3, a A;-component 
mixture of location-shifted symmetric distributions is almost always fc-identi- 
fiable, in the sense that the set Qk \ Lebesgue measure zero in M^*^"^ 

(because of the constraint on A, should be viewed as a subset of R^'^"^, 
rather than M?^, in order to have positive Lebesgue measure). We conjec- 
ture that this is true for all k; however, because the situation gets even more 
complicated for larger k, we do not describe $7^ for /c > 4 in this article. 

3. Estimation. Given a simple random sample from the distribution Fq = 
Gq -k Ak{X^ , fj-^) , assuming {X^,fx^) is contained in il.^, it is natural to ask 
how one might tackle the problem of deconvolution. The idea for estimating 
{X^,fi^) is as follows. Since a distribution is zero-symmetric if and only if 
its convolution with Go is zero-symmetric. Theorem 1 implies that there is 
exactly one (A, fj,) G fifc such that 

Fo * A- (A, fi) = [Go * Afc(AO, /x")] * A" (A, n) = Go* [Ak{X°, /x^) * A" (A, /i)] 

is zero-symmetric, namely the true parameter value (X^,fj,^). Therefore, our 
plan is to search for a A and a /x that bring F„ t^t A^(A, /x) as close as 
possible to being a zero-symmetric distribution, where Fn is the empirical 
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distribution function derived from the sample. We measure closeness to zero- 
symmetry as the distance between a distribution and its reflection across the 
origin. To this end, we define real-valued functions 

(5) 
and 



d{X, /x) = V{Fo * (A, ^l),FQ * Afc(A, /x)} 



(6) d„(A, /x) =V{Fn* A^(A, tx),F^ * A,. (A, /i)}, 

where D{Fi,F2) is some measure of the distance between distribution func- 
tions Fi and F2. Provided F2) = if and only if Fi coincides with F2, 
{X^,fi^) is the unique minimizer of (i(A,/i) and we estimate it by 



(7) 



iX,fi) = arg ^ min ( A, /x) . 

(A,/x)Gnfe 



[If the minimizer is not unique, then (A, p,) may be taken to be an arbitrarily 
selected minimizer.] 

There are many possible choices for D. Some, however, do not work well 
in this context. For example, total variation distance is not a good choice 
because -F„*A^(A, /x) and F~*Afc(A,/x) are both discrete distributions 
that are generally supported on entirely different points. In this article, we 
focus on Lp-distance for 1 < p < cxd, defined for finite p by 

i/p 



(8) 



V{Fi,F2) 



\Fi{t) - F2{t)\P dt 



and for p = 00 by F2) = sup^ \Fi(t) — -^2(^)1- Note that if p < 00, then 

'D{Fi,F2) < 00 whenever both Fi and F2 have finite first moments because 



/oo 
\Fiit) - F2it)\ 
-00 



dt 



< {Fi{t) + F2it)}dt+ {l-F,{t) + l-F2it)}dt 



= Epj^ I A' I + I A' I . 
With this choice of distance and finite p, 

(9) MA,/x)r 



^Aj{l-Fo(^,-t)-Fo(^j+t)} 



and 

(10) K(A,/x)F 



Y^Xj{l-Fn{lIj-t)-FnifIj+t)} 



dt 



dt. 



If G(){z) has finite first moment, then the minimizer (A, /i) of (i„(/x. A) is 
strongly consistent. To demonstrate this, we rely on a pair of lemmas. 



MIXTURES OF SYMMETRIC DISTRIBUTIONS 



9 



Lemma 1. If ^<p < oo, we assume G(){z) has finite first moment; if 
p = oo, we make no such assumption. In either case, (i(A,/i)— >0 almost 
surely as oo. 

Lemma 2. Under the assumptions of Lemma 1, for any e > 0, there 
exists (5 > such that d{\,^) > 5 whenever ||(A,/i) — (A'^,/2'^)|| > e. 

Intuitively, Lemma 2 states that d{X, /x) is bounded away from zero out- 
side any neighborhood of (A°,/x*^). By Lemma 2, the event {d{X,p,) < 6} 
is contained in the event {||(A,/x) — (A'^,/x'^)|| < e}. But e is arbitrary, so 
by Lemma 1 we conclude that ||(A,/i) — (A'^,/i'^)|| — > almost surely. This 
proves the following theorem. 

Theorem 3. Suppose that Gq{z) has finite first moment and T>{-,-) is 
Lp-distance with 1 <p < oo. Then (A, fi) — > (A", fi^) almost surely as n ^ oo. 
(In the case p = oo, the first moment condition is not necessary.) 

Once we have an estimate of {\'^,fi^), we turn to the question of esti- 
mating Gq. It may be that we only wish to estimate a particular functional 
of Go such as its variance a^. Since Fq = Gq* Ak{X^,fx^), we obtain 

k 

a2 = Var^,(X)-5^A,(/i,-/2)2, 
i=i 

where p, = J2j ^jf^j- the case A; = 2, if S'^ denotes the sample variance of 
Xi , . . . , Xn , then we obtain as an estimator of 

(11) a^ = S^-XiX2{fi2-f^if- 

If fj^ < oo, then the strong consistency of o"^ follows from the strong law of 
large numbers and Theorem 3. 

However, we may be interested in estimating the function Go{t) itself. 
We focus on the case k = 2. From Fq = Go * A2(A'', fjP), we obtain the linear 
equation 

(,^. ( Foix) \ /A? An /Go(x-/.?A 

^ ^ U^(^-/^?-/^^V U2 A?j l^Go(x-/.0)j' 

valid for all x for which -\- ij,2 — x is a continuity point of Fq{-). Equa- 
tion (12) may easily be inverted to give a formula for Go(a^ — /i?) and 
Go{x — ^2) [note that the identifiability requirement that Ai 7^ 1/2 is re- 
flected in the fact that the 2x2 matrix in equation (12) is singular when 
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Ai = 1/2]. Replacing the parameters A*^, fi^ and Fq by their respective esti- 
mates gives 

XiFn{x) - \2F~{x -fii- (12) \ 

-X2Fn{x) + XlFn{x -fix- (12) ) 

We could thus obtain two estimates for Gq{z), one by setting x — jli = z 
and the other by setting x — fi2 = z. Taking the mean of these two estimates 
yields 

fl3) G [z) = - ftl) + Fn{z + jxi)] - X2[F-{z - fl2) +Fn{z + fl2)] 

2CX1-X2) 

The function Gq{z) has the appealing property that it satisfies the zero- 
symmetry condition: at all continuity points z, Go{z) = 1 — Go{—z). Fur- 
thermore, limz^-ooGo{z) =0 and lim^_>oo G'o(^;) = 1- However, Go{z) is not 
necessarily a legitimate distribution function because it is not generally non- 
decreasing. Although this may initially appear to be a drawback, we consider 
the following corollary of Theorem 3 and the Glivenko-Cantelli theorem ([2], 
page 275), which states that sup^ \ Fn{t) — Fo(t)| — > almost surely. 



Corollary 2. Under the assumptions of Theorem 3 with k = 2, 
sup^ \ Go{z) — Gq{z) \ ^0 almost surely as n ^ oo. 

Thus, if we compute and graph an estimate Go{z) and find that it is not 
roughly monotone increasing, then there are two possible causes: either the 
sample size is too small for the asymptotics of Corollary 2 or the model 
is misspecified. In other words, Gq{z) might serve as a sort of graphical 
goodness-of-fit test; we will say more about this in Section 5. For k > 2, 
derivation of an estimator of Gq{x) is not as straightforward as for k = 2 
since Go may not be easily attained as the solution of a system of linear 
equations; a different method of deconvolution may be necessary in this 
case. 



4. Generalizing the Hodges Lehmann estimator. Although the strong 
consistency proved in Theorem 3 and the resulting Corollary 2 are valid 
for Lp-distance for any 1 < p < cxd, this section and the next consider only 
p = 2. Here, we demonstrate that the proposed estimator of equation (7), 
where P is L2-distance as defined in equation (8), is a generalization of 
the Hodges-Lehmann estimator (1) to the case of finite mixtures. Further- 
more, we establish sufficient conditions for the asymptotic normality of the 
estimator when p = 2. 

Let Hw{t) denote the distribution function of an arbitrary random vari- 
able W. Suppose W, W\ and W2 are independent and identically distributed 
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random variables. Then denoting max{VFi, W2} by Wi V W2, the identities 
Hwit) = HwiVW'zii) and Hw{t)H^w{t) = -f^-H/iVVl^2 (*) imply that 

/oo 
{Hw{t)-H_w{t)}^dt 
-00 

/ {-f^WiVVi^2(*) + H_WiV-W2{'t) - 2-ff-vi/iVVK2(*)}^* 

J —CO 



-00 
roo 



{-Hw^-^w^{-t) - H_y^,^^_y^,^{-t) + 2H_y^,^^y^,^{-t)} dt 

= E{2{-Wi V W2) - {Wi V W2) - {-Wi V -W2)}. 
Since 2 max{a, b} = a + b + \a — b\, we obtain 

/CXD 
{i^H/W - H^w{t)}^ dt = E(|t^i + W2\ - \Wi - W2I). 
-00 

Letting ~ A^(A, /^), we may combine equation (6) with equation 
(14) to obtain 

{dn{e)f = E(|t^i + W2\ - \Wi - W2\) 

(15) 

2 n n 



1=1 j=i 



where = (A,/i) and 



(16) f0{Xi,Xj) = ^Y1 ^a-Ki\Xi + Xj -Ha- ^^b\ + \Xi - Xj - Ha + l^b\)- 
a=l b=l 

When A: = 1, the only parameter to estimate is /i, the center of symmetry, 
and minimization of expression (15) reduces to 

n n . 

Xi ~\~ X j 



(17) fi = arg min 



i=ij=i 



Comparing fi with the Hodges-Lehmann estimator /xhl of equation (1), the 
two estimators are nearly the same, except that the sum in (1) places twice 
as much weight on the cases when i= j. Based on this similarity when k = l, 
the estimator 

(18) 6 = ar grain dn{0) = argmin ^ V V /^(xi, Xj) 

1=1 ]=i 

may be categorized as a generalization of the Hodges-Lehmann estimator 
for k>2. 
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To establish the asymptotic normahty of 6, note that from equa- 

tion (15) is a F-process (i.e., a set of ^-statistics indexed by the parame- 
ter 6). Define the functionals 

yW(/i) = E/i(Xi,...,Xfc) 

and 

-in n 

where h is some scalar- or vector-valued function of k real variables (re- 
call that Xi, . . . ,Xn is a random sample from Fq). In particular, {d{9)}'^ = 
^^^H/e) and {dn{0)}'^ = ^[^^(/e)- The Hoeffding decomposition for F-statistics 
has exactly the same form as it has for [/-statistics (cf. [1]), 




where vTfc/i is the kth Hoeffding projection defined, using the notation of 
empirical processes, by 

(20) TTkHx,,. . . ,xfc) = {6,, - Fo) • • • (5,, - Fo)Fo™-'=/i, 

where Qf = J f dQ denotes the action of the expectation operator under the 
distribution Q on the function / and 5x denotes a point mass at x (see [9] 
for an alternative formulation of the vr/c/i projection functions). Therefore, 
the sufficient conditions established by Arcones, Chen and Gine [1] for the 
asymptotic normality of [/-processes are valid in the present case. Their 
Theorem 2.1, establishing asymptotic normality, is based on Theorem 2 of 
[12] and we adapt these theorems to the present situation as follows. 

Theorem 4. Assume that Go{z) has finite first moment and that the 
following hold: 

(i) V^'^\fg), as a function of 0, has strictly positive definite second 
derivative J at its minimizing value 0^; 

(ii) for any e > 0, there exists 5 > such that 

limsupP] sup \nVji'^\'K2fe - vr2/0o)| > e| < e, 

where Bs denotes the open ball of radius 6 centered at 0^; 

(iii) there exists a measurable vector-valued function A(x) satisfying EA{X) = 
0, E||A(X)||2 <oo and 

TTifeix) = 7ri/0o(x) + {e- eyA{x) + \\e - 0'>\\rg{x) 
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for some rg such that for any e > 0, there exists 6 > such that 
limsupP] sup W^V^^\re)\ > s \ <e, 

where denotes the open hall of radius 5 centered at 6^. 

Then y/n(0 — 0^) A^(0, 4J^^S J~^) in distribution, where S is the co- 
variance matrix of A{X). 

We caution that although the covariance formula concluding Theorem 4 
appears simple, in our experience, it is extremely complicated to use in 
practice. Thus, we recommend a bootstrapped estimate of the estimator's 
covariance if one is needed. 

The general idea of the proof of Theorem 4 is as follows. First, define 
e = 9^ - 2J-Vi^^A. Show that both - 0°) and ^/E{e - 6>°) are 

stochastically bounded (the latter fact follows directly from the central limit 
theorem). Use these facts to prove that 

(21) nyp)(/- _ = _ 0oyj(^e _ + ^^(i)^ 
nvi'\f^-fe.) = ^{e-eyj{e-e') 

(22) 

-n{e-eyj{e-9^) + op{i). 

(2) 

Now, since 6 minimizes Vn fe, subtract equation (22) from equation (21) 
to obtain 

o<-^\\j'/\e-e)\\ + op{i), 

which implies that y/n{6 — ^) — > in probability. Since ^Jn{Q — 6^) converges 
in distribution to N{0,AJ~^TjJ~^) by the central limit theorem, this proves 
the result. Since the technical details missing above do not differ from those 
in the proof of Theorem 2.1 in [1], we do not repeat them here. These 
arguments are based on the proofs of Theorem 2 and Lemma 3 in [12], to 
which we also refer the interested reader. 

5. Implementation and examples. Although equation (15) allows an in- 
tuitive appreciation of the estimation method based on L2-norm minimiza- 
tion, it is not the most convenient formula from a computational standpoint. 
To aid notation, we introduce the functional inner product 



/oo 
f{t)g{t)dt 
-oo 
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and let ||/|| = \J (/, /) denote the corresponding norm. Therefore, if we define 
(t; /x) = - ^ - < t} ^ l{xi - fij<t} 



1=1 1=1 



for j = 1, . . . ,k, then equation (10) imphes that we may succinctly write 
dfi(A,/x) = \\T,j^jaj\\- 

In the case k = 2,we have Ai + A2 = 1 and thus dn{X-,lJL) = ||Aiai + A2a2|P 
is a quadratic function in Ai, minimized by Ai = — (ai — 02,(22)/ — fl2|P- 
Substituting Ai into d„(A,/x) gives 

||a2|P - (01,02)^ 



m(^i,/i2) 



ki - 02P 



as the function we wish to minimize. We accomplish this minimization us- 
ing the "optim" function in R, which implements an iterative Nelder-Mead 
algorithm [type "help(optim)" in R for details]. We recommend multiple 
starting values due to the fact that m(/x) often has multiple local minima. 
Code that implements our method for k = 2, written in R, is available at 
www . st at . psu . edu/ ~ dhunt er /code . 

We first apply our semi-parametric estimation procedure (hereafter re- 
ferred to as SP) to the Old Faithful data depicted in Figure 1. Results are 
compared with those obtained by maximizing the two-component normal 
mixture likelihood (a procedure we call NMLE from now on) that assumes 
components with equal variances. The assumption of equal variances in the 
NMLE case not only provides a fair comparison with the SP method (which 
assumes components with exactly the same shape) but it also avoids the 
awkward situation of an unbounded likelihood function created when un- 
equal variances are assumed in the normal mixture (we say more about this 
in Section 6). 

In Table 1, we see very close agreement between the two methods, with 
the standard errors only slightly larger for the SP method, even though 
Figure 2 indicates that the data in each component appear to follow a normal 
distribution quite closely. This figure depicts the close agreement between 
the estimate Go of equation (13) and the estimate based on the NMLE 
method, namely, a normal distribution function with mean and variance 
34.45. One important benefit of the SP method, even in cases like this in 
which the data appear to be well modeled by a normal distribution, is that we 
may validate the normality assumption using this nonparametric estimate 
of the underlying symmetric cdf. 

Next, we compare the SP method to the NMLE method for various types 
of simulated datasets. Results are summarized in Figure 3. The symmet- 
ric component distributions are taken to be normal, double exponential, 
uniform or t2 {t on two degrees of freedom). The values of A? are taken 
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Table 1 

Shown here are parameter estimates along with bootstrapped standard errors for the semi- 
parametric approach (SP) and the normal mixture approach using maximum likelihood 
estimation (NMLE) for the Old Faithful geyser data. The bootstrapped estimates are 

based on 200 resamples 





Ml (SE) 


A2 (SE) 


A (SE) 


(SE) 


SP 


54.00 (0.76) 


80.00 (0.50) 


0.352 (0.032) 


30.66 (7.93) 


NMLE 


54.61 (0.67) 


80.09 (0.45) 


0.361 (0.032) 


34.45 (3.39) 



from the set {0.15,0.30,0.45} and each sample is of size n = 200. Because 
both methods are susceptible to finding local optimum points, each algo- 
rithm was started at several places for every dataset. The initial values of 
111 and /U2 were taken to be the qi and q2 sample quantiles for each of 
the ten possible combinations satisfying (?i,(?2 G {0.05,0.2,0.5,0.8,0.95} and 
qi < q2- Furthermore, the EM algorithm for the NMLE method was started 
with initial values Ai = 0.5 and o"^ equal to half of the sample variance. The 
parameter estimates in all cases were taken to be the values corresponding 
to the best value of the objective function (i.e., the lowest value of dn or the 
highest value of the likelihood) among the ten. We took fJl = —fi2 — ~^ ^i^d 
o"q = 1 for all normal, double exponential and uniform examples. For the t2 
distribution, which has infinite variance, we took ^5 = — /ig = —2. 

For heavy-tailed distributions such as the double exponential and espe- 
cially t2, the SP method outperforms the NMLE method. Perhaps, since 
A? = 0.30 is the farthest value from the nonidentifiable situations A? = and 



q 
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-40 -20 EO 40 

Fig. 2. The jagged line is Go, estimated from the Old Faithful data using equation (13), 
and the other line is the NMLE estimate of Go, which is forced to be normal. 
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Fig. 3. Shown above are scatterplots of the parameter estimates (/ii,/t2) from 200 sim- 
ulated datasets of size 200. The true value of {^1,^2) is ( — 1,1) in all plots except the t2 
plots, where it is (—2,2). Each point is represented by the leading digit of X rounded to the 
nearest tenth. The dashed lines are one sample standard deviation of fii on either side of 
the sample mean of jli for i — 1,2. 



Xi = 0.5, the SP method fares better at that value of A? than at A? = 0.15 or 
A5 = 0.45. Nonetheless, the SP method performed surprisingly well for the 
A5 = 0.45 case, despite the fact that 0.45 is so close to the nonidentifiable 
value of 0.5. Both SP and NMLE had the most difficult time at A? = 0.15, 
regardless of the type of component distributions. 

Finally, we consider the robustness of both methods to violations of the 
assumptions that the component distributions are symmetric and that the 
components differ only in location. For nonsymmetric distributions, we use 
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SP: Case 1 



NMLE: Case1 
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Fig. 4. Estimation methods applied to simulated datasets that violate assumptions. All 
sets of axes are identical and the content of each plot is as explained in Figure 3. In case 1, 
both symmetry and equal variance assumptions are violated; in case 2, only symmetry is 
violated; in case 3, only equal variances is violated. In each case, df-^ — 50 and d/j — 75. 




Fig. 5. Both the upper plots and lower plots compare unfavorably with the values dfi = 50 
and d/2 — 75 used in Figure 4. 
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distributions. Suppose that dfi and df2 are given positive integers, where 
dfi < df2. Then we consider the following three cases: 

1. Ax2(dfi) + (1-A)x2(df2); 

2. XxH^h) + (1 - A)[(df2 - dfi) +x'(dfi)]; 

3. AA^(dfi,2dfi) + (1 - A)7V(df2,2df2). 

Both assumptions are violated in case 1, only the symmetry assumption is 
violated in case 2 and only the equal-variances assumption is violated in 
case 3. 

Consider dfi = 50 and df2 = 75 (all three cases) as an example (Figure 4). 
Not surprisingly, case 1 fares the worst. In a comparison between case 2 and 
case 3, it appears that case 3 is slightly better overall, suggesting that the 
skewness causes greater problems than the unequal variances. This impres- 
sion is further strengthened by the fact that dfi = 10, df2 = 15 fares much 
worse in case 2 than dfi = 50, df2 = 75 (see the upper plots in Figure 5). 
On the other hand, poorly-separated means appears to be more detrimental 
than mismatched variances, since case 3 fares much better for dfi and df2 
well separated than close together. For instance, with dfi = 50, the perfor- 
mance improves in case 2 as df2 increases, say, from 60 to 75 (see the lower 
plots in Figure 5). Further numerical tests reinforce these general impres- 
sions. 

In summary, we find that for both methods, the presence of unequal 
variances appears to be the least detrimental violation of the assumptions; 
it is not even as serious as poorly-separated means. On the other hand, 
violation of the symmetry assumption appears to have a much stronger and 
more unpredictable effect on the results. 

6. Discussion. This article addresses the question of how restrictive the 
assumptions about a mixture distribution must be in order for identifiability 
to hold and hence for estimation to make sense. In particular, we investigate 
the effect of presuming that the component distributions are symmetric and 
that they are all the same, except for location shifts. We establish compre- 
hensive identifiability results in the 2- and 3-component cases and indicate 
a direction of analysis for k> 3 (although similar results appear to become 
prohibitively complicated for larger k). We emphasize that with so little as- 
sumed about the form of the component distributions, it is quite surprising 
that identifiability is provable at all. Relaxing our assumptions even fur- 
ther by, say, allowing location-scale transformations (instead of just location 
transformations) is likely to have a major impact on identifiability by under- 
mining the convolution structure so central to the theoretical development in 
this article. Another possible assumption one might consider is unimodality, 
instead of symmetry, of components. However, such an assumption would 
certainly destroy identifiability unless additional restrictions were assumed; 
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furthermore, Walther [14, 15] has shown that mode-hunting and the search 
for mixture structure are not always compatible. 

There are clear practical applications of the estimation method we pro- 
pose. The simulation studies and data analysis in Section 5 suggest that our 
method is robust to different component distribution shapes and that its 
performance is never much worse than, and sometimes much better than, 
the canonical method of maximum likelihood assuming normal components. 
In addition, we have shown in a two-component example how our method 
allows the validation of parametric assumptions used to fit mixture models 
by providing a nonparametric estimate of the (unshifted) zero-symmetric 
components. 

The assumption that each of the component distributions must have ex- 
actly the same shape, which means, among other things, that they must have 
the same variance if the variance is defined, may, at first glance, seem like an 
overly rigid assumption. In particular, when comparing our method with the 
method of maximum likelihood using normal components, it may seem that 
the normal approach has an advantage since it appears to offer the possibil- 
ity of allowing different component variances. However, the equal-variance 
assumption is quite common throughout statistics (e.g., in ANOVA) and 
for cases in which this assumption is appropriate, it is wise — both in terms 
of statistical power and model parsimony — to utilize inference techniques 
that implement it. Furthermore, the comparison with the normal approach 
is slightly misleading: if we assume unknown means and unknown variances 
that are different, then the normal mixture likelihood is unbounded and 
therefore has no maximizer (although this defect may be mended by plac- 
ing a positive lower bound on the component variances). Finally, we point 
out that our method is consistent, even when there is no finite second mo- 
ment such as in the simulated t-distribution examples of Section 5, a case 
in which the parametric method performs poorly. We believe that the SP 
method offers an attractive alternative and/or complement to the standard 
parametric estimation in many problems. 



We first prove Theorem 1. For general k, define independent random 
variables Y and Z such that Z ^ G and Y ~ Afc(A,/2), where G € S and 
(A,/i) G Ofc. Then by equation (3), X = Y + Z has the mixture distribu- 
tion F. In terms of the characteristic functions of these random variables, 
we have 4>x{f) = 4'Y{'t)4'z(t). Suppose that 



for independent Z' ~ G' and Y' ~ Afe(A',/x'). Note that /c-identifiability of 
F holds if and only if equation (A.l) implies Y = Y' and Z = Z', where = 
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^Y{t)^z{t) = ^Y'{t)^Z'{t) 
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denotes equality of distributions. A pair of lemmas will help us to prove the 
theorem. 

Lemma A.l. Equation (A.l) implies that Y — Y' is zero- symmetric. 

Note that Y — Y' can always be made zero-symmetric by taking Y' = Y . If 
this choice of Y' is the only choice that makes Y — Y' zero-symmetric, then 
fe-identifiability follows, as stated in Lemma A. 2. 

Lemma A. 2. For Y ~ Afc(A,/x), suppose that Y — Y' is zero- symmetric 
for independent Y' ~ Afc(A',/i') only if X = X' and fi = fj,' . Then for any 
Z G G S , the distribution of X = Y -\- Z is k -identifiable. 

Lemma A. 2 implies that fi^ must contain all (A,/x) such that Afc(A,/x) 
cannot be zero-symmetrized by convolution with any /c-point distribution 
other than A^(A,/x). But these (A,/^) are the only possible elements of Q^, 
for if Afc(A, /z) :*r Ajr(A', /x') is zero-symmetric, but (A',/x') ^ (A,/^), then 

A^ (A, fi) ★ {Afc(A, fx) ★ A^ (A', fj,')} = A- (A', ★ {Afc(A, /i) * A^ (A, /i)} 

is not fc-identifiable. This proves Theorem 1 — all that remains is to prove 
Lemmas A.l and A. 2. 

Proof of Lemma A.l. A random variable is zero-symmetric if and 
only if its characteristic function is real- valued ([2], page 363). Multiplying 
each side of equation (A.l) by the complex conjugate of (/)y'(t), namely 
4>-Y'(t), we conclude that (pY{t)4'-Y'(t) is real- valued for all t such that 
(l^zit) 7^ 0. Since 4'z{t) is nonzero in a neighborhood of t = 0, the analytic 
function 

k k 

lm{(/)y(t)0„y/(t)} AjAj Sint(/ij - ^j) 

i=ij=i 

equals zero on an open interval and must thus be identically zero on the 
whole real line. We conclude that if equation (A.l) holds, then (j)Y{t)(p-Y'{t) 
is real-valued, soY — Y' is zero-symmetric. □ 

Proof of Lemma A. 2. By Lemma A.l, equation (A.l) implies that 
(/>y(t) = (f)Y>{t), so ^z(i) = (pz'(t) whenever 0y(t) ^ 0. But <^y(t) is an an- 
alytic function that is not identically zero, so {t:4>Y{t) = 0} is a discrete 
set. For continuous functions (pzit) = 4>z'{t) to agree outside a discrete set, 
they must be identical. Therefore, equation (A.l) implies both Y = Y' and 
Z = Z', so the distribution of y -|- Z is ^-identifiable. □ 
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Next, we prove Theorem 2 and then state and prove a similar character- 
ization of ^3 in Theorem A.l. 

Proof of Theorem 2. For any (A, /x) G Q.2 with 2Ai G {0, 1, 2}, A2(A, /x) 
is symmetric and thus G*A2(A,/i) cannot be 2-identifiable. Conversely, take 
Y ~ A2(A, /x) with 2Ai {0, 1, 2} and suppose that Y' ~ A2(A', /x') is inde- 
pendent of Y with the property that Y — Y' is zero-symmetric. 

The largest and smallest values assumed by Y — Y' must be opposites 
and receive the same weight by zero-symmetry, which implies that ^2 — f^i = 
fi2 — /-fi and A1A2 = X'iX2- Thus, A = A'. If / /x', then neither /ii — fi'i nor 
fj,2 — n'2 can be zero, so these points must receive the same weight, leading 
to Ai = A2, a contradiction. We conclude that Y and Y' have the same 
distribution. □ 

We now consider the case k = 2>. We start by giving two distinct cases in 
which a nonsymmetric 3-point distribution may be nontrivially symmetrized 
by convolution with another 3-point distribution [recall that A3(A,/x) may 
always be trivially symmetrized by convolution with A^(A,/x)]. We then 
assert in Theorem A.l that these are the only such 3-point distributions. 

Case 1. For any real numbers c, d and r such that d > and r > 1, let 

(A.2) /X = (c, c + 4(i, c + 6d) and A oc (r^, - 1, r), 

(A.3) ^l' = {c + d,c + 2,d,c + hd) and A' oc (r, r + 1, 1). 

Then 

A3(A, /x) * A3 (A', /x') = A6{(-5d, -3d, -d, d, 3d, 5d), r} 
is zero-symmetric, where r oc (r^, r'^ + r^, r'^ + — 1, + — 1, + r^, r^). 

Case 2. For any real numbers c, d and r such that d > and r > 1, let 
(A.4) /x= (c,c + 3d,c + 4d) and A oc (rV^, (r - 1) + 1, \/r), 
(A.5) /x' = (c + d,c+2d,c + 3d) and A' oc (r, y/r + r"^,!). 
Then 

A3(A, /x) * A3 (A', = A7{(-3d, -2d, -d, 0, d, 2d, 3d), r} 
is zero-symmetric, where 

T OC (r-v/r, r^\/r + 1, r^-v/r, (r — l)-v/r"+T, r^-v/r, r^-v/F+T, r\/r). 
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Theorem A.l. Let A he the set of all {X,fJ,) G ^3 that satisfy any one 
of the four conditions (A. 2), (A. 3), (A. 4) or (A. 5) for some real numbers c, 
d and r with d> and r > 1. Let A" be the set of all (A,/x) S such that 
((A3, A2, Ai), (— /U3, —fi2, —fJ-i)) G ^- Finally, let B be the set of all (A, /i) G ^^3 
such that A3(A, /i) is symmetric or A1A2A3 = 0. Then = ^^\[A[J A~ U 
B). 

Proof of Theorem A.l. The fact that 0^ c O3 \ (^ U A" U B) is 
immediate. Now, let Y ~ A3(A,/i) with (A,/x) ^ VL^\ {A\J A~ B) and 
suppose that Y — Y' is zero-symmetric, where Y' is independent of Y and 
Y' ~ A3(A',/x') for (A',/x') G 1^3. We wish to show that Y and Y' have the 
same distribution. 

Since Y is not symmetric, Y' cannot be a point mass, so we may assume 
without loss of generality that \'i and A2 are positive. To speed things along 
a bit, we introduce several new variables. Let i]! < r]2 < • • ■ < Vm denote the 
support points of y — y' . Furthermore, let dj = ^j+i — fJ-j and 5j = — ^j- , 
1 < J < 2, denote the gaps between the elements of n and /i'. Finally, let aj 
and a'j equal Xj/Xi and X'j/X'i, respectively, for 1 < j < 3, so that we may 
simplify calculations by working with the unnormalized vectors (1,02,03) 
and (1, 03). 

By the zero-symmetry of y — y', 

(A.6) P{Y -Y' = r]j) = P{Y -Y' = Vm-j+i) 

for 1 < j < m. We now consider the cases 03 7^ and a'^ = separately. 

Case A. 03 = 0. By equation (A.6) with j = 1, we obtain = Q^s- 
Because t]2 — r]i = r]m — rjm-i, we have 

(A. 7) min{(5i, 5^} = min{52, 

If 61 and 82 are both less than 5'^, then they must be equal by equa- 
tion (A. 7); if they are both greater than 5'i^ then they must be equal be- 
cause rji + 5i is the opposite of — ?/i — 82 by the zero-symmetry of y — y'. In 
either case, equation (A.6) with j = 2 gives 0:3 = 1, which is a contradiction 
because Y cannot be symmetric. 

If 5i = 82 = then equation (A.6) with j = 2 gives 02 + = 1 + 0203- 
Since 03 7^ 1 because Y is not symmetric, this implies that Q2 = 1 + 03 and 
so either Y or —Y must satisfy condition (A. 3), a contradiction. 

If 81 > 81 = 82, then rji + 81 must be zero because there is no other way 
that — (r]i + 8i) could be attained by Y — Y' . This implies that 81 = 282 - From 
equation (A.6) with j = 2, we obtain + 02 = 1 so that A oc (1, 1 — Og, 03). 
Setting r = 1/03, we see this implies that Y satisfies condition (A. 2), a 
contradiction. The case 82> 8'i = 81 leads to a similar contradiction in which 
— y satisfies condition (A. 2). 

By equation (A. 7), we have now exhausted case A. 
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Case B. 03 / 0. By equation (A. 6) with j = 1, we have a'^ = 03. In 
analogy with equation (A. 7), we obtain 

(A.8) mm{Si,62} = mm{d2,6[}. 

We may group the possibihties into the following four categories according 
to the relative values of 61, 62, 5'i and 

Bl: 61 < 62 and 82 < 5[, or 61 > 62 and 52> 
B2: 61 < 62 and 62 > d[, or 61 > 62 and 62<6[. 
B3: 6i = 6'2 = S2 = S[. 

B4: 61 > 62 and 62 = 6[, 5i < 62 and 62 = 6[, 61 = 62 and 62 < S'l, or 61 = 62 
and 82 > S[ . 

The details of the following arguments are similar to those of Case A, so we 
omit many of then. In the case Bl, Y may be shown to be symmetric, which 
is a contradiction. In the case B2, we obtain a' = a, 5i = 6[ and 62 = 82, 
which means that Y and Y' have the same distribution. In the case B3, 
either 03 = 1, which leads to a contradiction since Y is then symmetric, or 
a = a' , in which case Y and Y' have the same distribution. 

Case B4 implies that three of 61, 62, S'l and 62 are equal, while the fourth 
is at least double this common value. For the sake of illustration, suppose 
that 61 is the large value, so that 

(A.9) 6i> 262 = 2S[ = 262. 

If equality holds in (A.9), then we may show that Y satisfies condition (A. 2) 
and Y' satisfies condition (A. 3). On the other hand, if inequality in (A.9) is 
strict, then Si must equal 3^2) in which case Y satisfies condition (A. 4) and 
Y' satisfies condition (A. 5). Either outcome gives (A, fi) € A, a contradiction. 
A similar contradiction occurs when the role of 5i is interchanged with that 
of (52, S[ or 6'2 in (A.9). 

Since B1~B4 exhaust Case B, and Case A always leads to a contradiction, 
we conclude that Y and Y' must have the same distribution. □ 

APPENDIX B: CONSISTENCY PROOFS 

Proof of Lemma 1. For the case of finite p, we wih show that {d{X, 
/i)}^ — > almost surely. It suffices to prove that 

sup |{(i(A, /x)}*' — {(i„(A, /i)}^| ^ almost surely, 

(A,/x)Gnfc 

since d{X^,fjP) = and dn{X,fi) < dn{X^,fi^) imply that 

{d(x,p,)r < {d{x,f^)r - K(A,/i)r + {d^AO,/!")^ 

< \{d{x,fi)r - {dn{x,f,)r\ + iWA°,/^°)r - {dn{x'>,fj,^)r\ 
<2 sup \{d{x,fx)r-{dn{x,fi)r\. 
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Define the functions 

k 

ait) = J2 - Foifij -t)- Foifij + t)} 
i=i 

and 

k 

an{t) = - Fniflj -t)- Fnipij + t)}. 

i=i 

Since x ^ — px is nonincreasing on [0, 1] for p > 1, — aP < p{b — a) 
whenever < a < & < 1. Therefore, 

(B.l) \\a\P -\b\P\<p\a-b\ for |a|, |6| < 1. 

Since \a{t)\ and |Q!n(i)l both less than or equal to 1, we obtain 

\{d{x,fi)r-{dn{x,ti)r\-- 



oo 
oo 



{\a{t)\P-\an{t)\ndt 

•j 

<p I \ait)-an{t)\dt 

J —oo 
^ roo 

(B.2) < p ^ A W I Fo ifij -t)-Fn (/ii -t)\dt 

j=i 

k 

poo 

+ pY.Xj \FoifIj +t)- Fnifij +t)\dt 

■ 1 J—oo 



i=i 



/oo 
\Fo{t)-Fn{t)\dt. 
-oo 



Note that taking the supremum over (A,/^) € 17^ is now irrelevant. Let 
Qnit) = sign(t){F„(t) - Fo(t)} and let g+{t) = gn{t)I{gn{t) > 0} denote its 
positive part. Let 

Foit), ift<0, 
l-Fo(t), ift>0 



9{t) 



and note that < g^it) < dit)- Since g{t) is integrable by the assumption of 
a finite first moment and gnit) almost surely by the Glivenko-Cantelli 
theorem, / g^{t)dt — > almost surely by the dominated convergence theo- 
rem. Furthermore, / g-nii) dt = Ef^\X\ — ^Y^i \Xi\ almost surely by the 
strong law of large numbers. Since |5'n(i)| = (t) — gn(t), this proves that 
the right-hand side of inequality (B.2) goes to almost surely. 

The case of p = 00 is much simpler. Repeated use of the triangle inequality 
shows that 

sup|a„(t)| <2sup|F„(t) - Fo{t)\ -|-sup|a(t)| 
t t t 
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and the same inequality holds if a„,(t) and a{t) switch places. Therefore, 

\d{\,fi) - dn{X,fx)\ < 2sup|F„(t) -Fo(t)| 

t 

and the Glivenko-Cantelli theorem implies that as n ^ oo, 

sup \{d{X, fi)} — {dn{\, tJ-)}\ ^ almost surely. 

(A,/j,)Gr2fc 

Note that no finite first moment assumption is required for p= oo. □ 

In order to prove Lemma 2, we first define a new function /i(A,/x) and 
show that it is uniformly continuous. The introduction of this function may 
seem mysterious, but it is designed specifically to resemble the function 
d{X,fj.), while at the same time possessing the crucial uniform continuity 
property, the importance of which will be discussed further in the proof of 
Lemma 2. 



Lemma B.l. For 1 < p < oo, the function 

k 



/i(A,^) 



(B.3) 



^Aj{l-Fo(M,-t)-Fo(/x,+t)} 



is uniformly continuous if Gq has finite first moment. 
Proof. By inequality (B.l), 

-■1 J —oo J 1 



:4A:p^|A,-A;.| 



Thus, h(X,fj,) is uniformly continuous in A. Furthermore, |a^c — lfd\ < 
\d{dP — lf)\ + \aP{c — d)\ and inequality (B.l) together imply that \h{X,fi) — 
/i(A,/x')| is bounded above by 

k 

2kpY,\j / {|Fo(//,- - t) - Fo(/x;- -t)\ + \FQ{nj + t)- Foin'j + t)\} dt 



k 

+E/ (I 
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Since each of the above integrals is over the whole real line, each depends 
on fij and fi'j solely through the difference /.ij — fi'j . The dominated conver- 
gence theorem implies that each integral tends to zero as — /i' — > 0, which 
establishes that h{X, /i) is also uniformly continuous in /x. □ 

Proof of Lemma 2. We first consider the case of finite p (the case 
p = (X) is much easier). Since /i(A,/x) defined in equation (B.3) is bounded 
above by 2k{d{X, fx)}^ , if Lemma 2 is false, then there exist e > and a 
sequence 

{(A^A.-)}- , C {(A,/.) e n,:\\{\,fx) - > 8} 

such that /i(A",/x") — > as n ^ oo. We now show that this leads to a con- 
tradiction. 

Passing to a subsequence if necessary, we assume without loss of generality 
that A" — > A* and that each of the sequences /x" has a limit, either finite or 
infinite. By the uniform continuity of /i(A,/i) (Lemma B.l), we have 

(B.4) /i(A*,/i")^0. 

Note that we cannot obtain an analogous expression using d{X,fi) instead 
of h{X,fj,) because d is not uniformly continuous. 

A standard result in analysis states that if /n — > / in L}{R): then /„ has 
a subsequence that converges to / almost everywhere (a.e.) with respect to 
Lebesgue measure. Thus, passing to a subsequence if necessary, we see that 
(B.4) implies 

j2\*{l-F,{^,^-t)-Fo{^^'^ + t)] 
(B.5) 

If some of the sequences {/^"} are bounded, say /i" ^ ^u*, replacing these 
sequences by their limits does not change (B.4) because of the uniform con- 
tinuity of /i(A,/x). Furthermore, in this case, the second sum in (B.5) is 
bounded away from zero, which implies that the expression inside the ab- 
solute value symbols tends to zero for almost all t. We conclude that if 
{j : fj.'j ^ fij} is nonempty, then 

J2 X*{l-Fo{fi*-t)-Fo{fi* + t)}+ J2 E A*=Oa.e. 

(B.6) 

Letting t — > oo in equation (B.6) gives 



1=1 
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Hence, equation (B.6) implies that 

^ A*{1 - Foifi* -t)- Foifi* + t)} = a.e., 



which contradicts the assumption that Fq is A;-identifiable (note that we may 
assume without loss of generality that 7^ whenever //" — > /i*; otherwise, 
this j'th component may be entirely ignored). Therefore, every sequence {/^j } 
goes to ±00 as n ^ 00. 

Next, take C to be an arbitrary constant that is not contained in the set 



fi]^ + C and 



{fii, . . . ,tJi^}- Fix jo such that A^^ 7^ 0, then define a" = /x^ 
b'j = fj.'j + fi'j^ — C for 1 < j < k and n > 1 . Under the change of variable 
s = C-fil-t, 



JO 



J — ( 



Y^X*{l-F,{a]-s)-F,{h^ + s)} 



Y^,' -\a"-s\ , -\b"+s\ 

X 2^[e '+e 'j 



)ds. 



Thus, passing to a subsequence if necessary, the argument leading to (B.5) 
implies that 



P k 

E( 



|a"- 

e ' J 



+ e ' J 



) — > a.e. 



(B.7) Y.\*{l-F^{a^-s)-F^{h^ + s)} 

Since a^-^ = C for all n by definition, the second sum in (B.7) is bounded 
away from zero, implying that 

k k 

(B.8) - ^o(a" -s)}-Y. ^jFoib] + s)^0 a.e. 

i=i j=i 

Passing to a subsequence if necessary, we assume that all of the sequences a" 
and have limits, either finite or infinite. For any of these sequences whose 
limit is finite, we denote this limit by or bj. Thus, we may decompose the 
sum in (B.8) into parts according to the limits of the a" and 6", as follows: 



A*{l-Fo(a*-.)}- A*Fo(6* + s)+ ^ 



(B.9) 



■'3 3 



- E A ■ 



■ a.e. 



Letting s — > —00 in equation (B.9) gives 



E 



E 



0. 
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Therefore, equation (B.9) implies that 

(B.IO) A*{l-Fo(a*-s)}= ^ X*Fo{b* + s)a.e. 

■' 3 J ■'33 

Let k\ and k2 denote the number of finite and 6^, respectively. Note that 
k\ and must be positive by equation (B.IO) because a*^^ = C and A*^ 7^ 0. 
Furthermore, since 1//" | — > 00 for each j, at most one of {Oj } and {&"} can 
remain bounded; thus, A;i + /c2 < k. (This gives an immediate contradiction 
if fc = l.) 

Equation (B.IO) is an identity of distribution functions. It states that 

(B.ll) Fo-(t)*Afc,(a,^) = Fo(t)*Afe2(/3,77), 

where Afcj(Q:,^) is the distribution function supported on the finite o* with 
weights proportional to the corresponding Aj and Afc2(/3,T7) is the dis- 
tribution function supported on the finite —h*^ with weights proportional 
to the corresponding A^. Recall that = * Afc(A*^, /x"). Define inde- 
pendent random variables Z ~ Gq, Y ~ Afc(A*^, /i''), W\ ^ Afe-^(Q!,^) and 
W2 ~ Afc2(/3,?7). Then in terms of characteristic functions, equation (B.ll) 
becomes (\)-w-^4>-y(^z = <i>W'i4>Y4>z- We may cancel 4>z from both sides because 
0VKi0-y and 4>w^<^Y are analytic functions that agree whenever (^z / 0. This 
leads to 

1 1 
(B-12) -((/)_i4/i(/>y +(^VFi0-y) = -j{^-Wx +<PW2)4'Y- 

The left-hand side of equation (B.12) is a real function, which means that 
^{ A- (a, + Afc, (/3, 77)} * Afc(AO, /x") 
is zero-symmetric. Since ki + k2 < k and (A'^,/x'^) G 17^, we have 
(B.13) ^{A^^(«,0 + A,,(/3,77)} = A,T(AO,/zO) 

by Theorem 1. But this is impossible, because the distribution on the left 
side of equation (B.13) assigns nonzero weight to the point —ai = —C and 
C was chosen specifically so that the distribution on the right assigns no 
weight at — C. This completes the proof for finite p. 
li p = 00, then 



d(A, /i) = sup 



k 

Y,\j{l-F^{^JiJ-t)-Fo{^lj+t)] 

i=i 



is uniformly continuous in A, so the /i(A, jj) function is unnecessary. Thus, 
we assume the existence of a sequence (A",/x'^) such that A" — > A*, each 
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/x" has a limit and d(A*,/x") ^0. Equation (B.6) is true because Foit) is 
almost everywhere continuous; equation (B.8) is trivially true. The rest of 
the proof for p = oo is identical to the proof for p <oo, and we arrive at a 
contradiction. □ 
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